Steve Ardire on LinkedIn: McDonald's is selling Krispy Kreme doughnuts in one test market. I tried…

您所在的位置:网站首页 krispy kreme case study Steve Ardire on LinkedIn: McDonald's is selling Krispy Kreme doughnuts in one test market. I tried…

Steve Ardire on LinkedIn: McDonald's is selling Krispy Kreme doughnuts in one test market. I tried…

2023-03-29 16:32| 来源: 网络整理| 查看: 265

Should you Scale Categorical Variables in Machine Learning? This is a common question that I get in my course "Feature Engineering for Machine Learning." The short answer is, "it depends." Let’s break this down.   👉 Most machine learning algorithms require numerical inputs. Hence, we need to transform categorical variables, which are usually strings, into numerical features using techniques like one-hot encoding, ordinal encoding, target encoding, and weight of evidence, among others.   👉 Most machine learning models also require feature scaling (decision tree-based models are the exception). So, what shall we do with categorical variables?   ✅ If we used one-hot encoding, we already scaled the variables between 0 and 1. So, in principle, we do not need further scaling. In fact, if we apply min-max scaling, we will obtain exactly the same variable.   ▶ However, if we apply standard scaling to a binary variable, the result changes with the variable distribution; ▶ If the dummy variable has 50% of each 0 and 1, standardization turns the original values into -1 and +1, respectively. Originally, there was a unit difference between levels, and after the transformation, there is an overall difference of 2 units on the normalized scale. ▶ If the dummy variable has 90% 0s and 10% 1s, the standardized variable will show the values -0.33 and 3, for an overall difference of 3 units in the normalized scale. ▶ So with standardization, we are distorting a somewhat arbitrary distance of 1 unit between categories into values that vary with the prevalence of the category.   ✅ If we use target encoding for binary classification, we also return a scaled variable between 0 and 1. So, in principle, we do not need to scale those variables either.   ✅ If we use the weight of evidence, we would normally also discretize numerical variables and then encode those intervals with the weight of evidence; hence, all our variables will be within the same scale, the logit scale.   ✅ For encoding methods that do not leave the variable scaled between 0 and 1, or -1 and 1, such as ordinal encoding or target encoding when the target is continuous, we should scale the variable before training our models.   Overall, the decision to scale categorical variables depends on the machine learning algorithm being used, and the encoding method used to transform the variable into a numerical feature. Standardization or min-max scaling? Most categorical variables do not have a natural numerical order, so statistical parameters like the min, max, mean or std, do not make a lot of sense. Hence, we should choose the scaling method that works best for the problem at hand and returns the best performing model.   Additional reference: https://bit.ly/3lnLk1A #MachineLearning #CategoricalVariables #DataScience #DataPreprocessing"   What's your default approach with categorical variables? Let me know in your comments 👇    



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3